Generalized Unique Reconstruction from Substrings
نویسندگان
چکیده
This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, strands are sequenced reading some subset their substrings. While previous works considered two extreme cases all substrings pre-defined lengths read or with no overlap for the single string case, this work studies extensions paradigm. The first extension considers setup consecutive given minimum overlap. First, an upper bound provided on attainable rates that guarantee unique reconstruction. Then, efficient constructions asymptotically meet presented. second extension, we study where multiple strings reconstructed together. Given number length, derive lower substrings' length $\ell$ necessary existence multi-strand non-vanishing rates. We then present show approach 1 values behave like bound.
منابع مشابه
Minimum Unique Substrings and Maximum Repeats
Unique substrings appear scattered in the stringology literature and have important applications in bioinformatics. In this paper we initiate a study of minimum unique substrings in a given string; that is, substrings that occur exactly once while all their substrings are repeats. We discover a strong duality between minimum unique substrings and maximum repeats which, in particular, allows fas...
متن کاملTight Bounds on the Maximum Number of Shortest Unique Substrings
A substring Q of a string S is called a shortest unique substring (SUS) for interval [s, t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s, t], and every substring of S which contains interval [s, t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s, t] all the SUSs...
متن کاملReconstructing Strings from Substrings
We consider an interactive approach to DNA sequencing by hybridization, where we are permitted to ask questions of the form "is s a substring of the unknown sequence S?", where s is a specific query string. We are not told where s occurs in S, nor how many times it occurs, just whether or not s a substring of S. Our goal is to determine the exact contents of S using as few queries as possible. ...
متن کاملTight bound on the maximum number of shortest unique substrings
A substring Q of a string S is called a shortest unique substring (SUS) for position p in S, if Q occurs exactly once in S, this occurrence of Q contains position p, and every substring of S which contains position p and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query position p all the SUSs for position p can ...
متن کاملFinding Characteristic Substrings from Compressed Texts
Text mining from large scaled data is of great importance in computer science. In this paper, we consider fundamental problems on text mining from compressed strings, i.e., computing a longest repeating substring, longest non-overlapping repeating substring, most frequent substring, and most frequent non-overlapping substring from a given compressed string. Also, we tackle the following novel p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Information Theory
سال: 2023
ISSN: ['0018-9448', '1557-9654']
DOI: https://doi.org/10.1109/tit.2023.3269124